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ENGINEERING  UMAM 


ABSTRACT 

A  set  of  macros  to  enable  users  to  retain  numerical  significance 
during  critical  phases  of  a  calculation  is  presented  along  with  the 
philosophy  behind  their  conception.   The  macros  are  designed  for  speed 
and  have  an  average  accuracy  of  at  least  92  binary  places. 
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1.   INTRODUCTION 

These  augmented  precision  algorithms  for  ILLIAC  IV  are  presented 
with  two  aims  in  mind: 

-  To  provide  the  general  user  with  a  method  of  keeping  numeri- 
cal significance  (and  therefore  accuracy)  even  though  he  is  dealing 
with  numbers  of  very  disperate  size  ■ 

-  To  record  these  augmented  significance  algorithms  so  that 
they  may  be  modified  by  future  ILLIAC  IV. 

Their  function  is  thus  different  from  the  algorithms  described  by 
Yasui  [l]  in  the  sense  that  they  do  not  maintain  96  binary  places  of 
accuracy,  but  only  the  accuracy  desired  by  the  user  up  to  about  92 
places.   As  a  consequence,  they  are  faster  and  less  bulky  than  full 
double  precision  routines,  and  use  neither  CU  instructions  nor  bits  of 
the  RGD.   The  CU  and  mode  bits  are  thus  left  free  to  control  the  main 
algorithm. 

Because  these  algorithms  are  directed  toward  maintaining  signifi- 
cance, routines  are  also  provided  for  obtaining  augmented  numbers  from 
sums  and  products  of  single  precision  numbers.   A  routine  is  provided 
for  rounding  augmented  precision  numbers  and  converting  them  to  single 
precision. 

No  facilities  are  provided  for  reading  or  printing  augmented  preci- 
sion numbers.   Indeed  it  is  envisioned  that  users  will  use  them  only 
temporarily  in  their  programs  to  ensure  accuracy  during  certain  criti- 
cal parts  of  a  calculation,  and  convert  augmented  precision  results  to 
single  precision  before  reaching  the  end  of  their  algorithm. 

The  initiated  reader  may  observe  that  more  elegant  macros  may  be 
achieved  by  using  "conditional  compile"  facilities,  but  since  the 


paper  is  written  for  the  general  user,  our  aim  is  clarity  at  the 
expense  of  elegance  whenever  the  latter  would  make  the  method  less 
comprehensible. 


2.   REPRESENTATION 

A  row  of  augmented  precision  numbers  is  represented  "by  two  rows  of 
ordinary  ILLIAC  IV  6k -hit    floating  point  numbers.   [One  word  (a  );  the 
more  significant  part,  is  not  necessarily  normalized,  but  its  exponent 
is  the  true  exponent  for  the  whole  augmented  number.   The  other  word 
(a  )  is  the  less  significant  or  adjustment  part  of  the  number.   Its 
exponent  is  usually  about  kQ   less  than  the  exponent  of  a  ,  the  mantissa 

Vi  "1 

may  be  of  different  sign  from  that  of  a  ,  but  when  the  mantissa  of  a 
is  aligned  so  that  it  is  of  the  same  sign  as  a  and  its  exponent  is 
exactly  kQ   less  than  the  exponent  of  a  ,  then  (a  ,  a  )  =  a  +  a 
represents  a  double  precision  number  with  the  exponent  of  a  . 

However,  this  "aligned"  representation  is  rarely  necessary,  and  is 
not  explicitly  maintained  by  the  augmented  precision  algorithms.   It 
is  the  relaxation  of  these  rules  of  representation  that  give  these 
algorithms  their  speed  and  compactness  with  a  small  cost  in  accuracy. 

Throughout  this  document  a  and  a  are  represented  as  AH  and  AL 
respectively,  in  program  text,  the  terminal  H  or  L  representing  the  more 
or  less  significant  part  of  the  number  respectively.   Single  precision 
numbers  are  represented  by  single  letters  without  an  H  or  L  ending. 

A  single  precision  number  x,  is  expressed  as  (x,  0)  in  augmented 
precision. 

An  augmented  precision  number  has  a  value  within  a  neighborhood  e 
of  a  single  precision  number  if  the  result  of  normalizing  the  difference 
between  the  augmented  and  single  precision  number  has  a  modulus  less 
than  e.   A  similar  relationship  holds  between  augmented  precision 
numbers. 


3.   USE 

All  augmented  arithmetic  algorithms  presented  in  this  document  are 
represented  as  defines  which,  except  for  a  few  cases,  operate  on  an 
augmented  precision  accumulator  represented  by  the  labels  ACC  and  AUXACC. 
ACC  and  AUXACC  contain  the  more  and  less  significant  part  of  the  aug- 
mented result.   The  accumulator  is  merely  a  device  to  reduce  the  length 
of  parameter  bits  in  the  defines,  and  because  the  need  for  augmented 
precision  numbers  tends  to  arise  from  the  need  to  accumulate  sums  and 
products  of  single  precision  numbers,  saves  unnecessary  access  times 
as  well . 

The  parameters  of  the  defines  may  be  any  valid  PE  address  allowable 
in  ASK  [2]  providing  they  do  not  conflict  with  the  register  requirements 
of  the  defines  being  invoked  (See  Appendix  B). 

Augmented  precision  values  may  be  created  by: 

1.  Assigning  ACC  and  AUXACC  the  augmented  precision  value  directly: 
LIT(3)  =  1.0; 

LDA  $C3; 

STA  ACC; 

CLAR; 

STA  AUXACC; 
(or  more  directly  : 

LIT(3)  =  1.0; 

DLOAD  $C3  =  0 ; ) 
will  create  the  augmented  precision  number  (l.O,  0)  in  the  accumulator. 

2.  Using  the  ILLIAC  IV  extended  add  (EAD)  [3]  or  extended  sub- 
tract (ESB)  instructions, 

DEFINE  AUX  &A  &B  = 


LDA  &A; 

EAD  &B; 

STB  ACC; 

STA  AUXACC  ##; 

DEFINE  SUX  &A  &B  = 

LDA  &A; 

ESB  &S; 

STB  ACC; 

STA  AUXACC  ##; 
will  form  augmented  precision  value  in  the  accumulator  from  the  sum 
or  difference  respectively  of  invocation  values  of  &A  and  &B. 

3.   Using  the  MUX  define  to  form  the  product  of  two  single  preci- 
sion numbers.   Because  the  ILLIAC  IV  multiply  (ML)  instruction  does 
not  produce  a  true  augmented  precision  product,  the  define  DML  is 
provided  to  do  so  (see  Appendix  C).   The  instructions: 

LDA  X; 

DML  Y; 
leave  the  more  significant  part  of  the  augmented  precision  product 
of  X  and  Y  in  RGR  and  the  less  significant  part  in  RGA.   Thus: 

DEFINE  MUX  &A  &B  = 

LDA  &S; 

DML  &B; 

STR  ACC; 

STA  AUXACC  ##; 
The  defines  OCLEAR,  DLOAD  and  DSTORE  (which  respectively,  clear,  load 
and  store  the  augmented  precision  accumulator)  are  provided  so  that  the 
user  may  work  with  an  augmented  ILLIAC  IV  which  in  addition  to  the 


existing  ILLIAC  IV  facilities  has  a  single  augmented  precision  accumu- 
lator upon  which  all  augmented  precision  functions  act. 

This  augmented  precision  accumulator  destroys  the  symmetric  pro- 
perties of  the  augmented  precision  operations  in  the  sense  that  it 
alone  is  normalized  during  the  augmented  precision  operations.   It  is 
truly  an  accumulator  of  sums  or  products,  and  its  use  in  this  role 
automatically  increases  the  accuracy  of  the  calculation  (See  Section  5). 
To  dispense  with  a  unique  accumulator  would  either  increase  the  execution 
times  of  the  augmented  arithmetic  functions  or  reduce  their  accuracy. 

Generally  speaking,  the  results  of  augmented  arithmetic  calculation 
are  not  normalized  after  the  calculation  because  they  will  be  normalized 
during  the  next.   However,  they  are  normalized  before  being  converted 
back  into  single  precision  numbers. 


U.   FUNCTIONS 

In  addition  to  the  basic  functions  described  in  the  last  section, 
the  package  provides  the  following  facilities: 
k.l     Operations  involving  single  precision  numbers. 
DEFINE  DOT  &A  &B 
This  operation  forms  the  double  length  product  of  &A  and  &B  and 
adds  it  to  the  augmented  precision  accumulator.   Its  repeated  applica- 
tion forms  the  scaler  product  of  its  successive  augments. 
DEFINE  DPLUS  &A 
This  operation  adds  the  single  precision  quantity  &A  to  the  aug- 
mented precision  accumulator. 
DEFINE  DMINUS  &A 
This  operation  subtracts  the  single  precision  quantity  &A  from  the 
augmented  precision  accumulator. 
DEFINE  DTIMES  &A 
This  function  multiplies  the  augmented  precision  accumulator  by 
the  single  precision  number  &A.   The  result  is  left  in  the  augmented 
precision  accumulator. 

k. 2     Operations  on  the  augmented  precision  accumulator. 
DEFINE  DNEG 
This  operation  negates  the  augmented  precision  accumulator. 

DEFINE  DRECIP 
This  operation  replaces  the  contents  of  the  augmented  precision 
accumulator  by  its  reciprocal.   The  method  used  includes  finding  the 
reciprocal  of  the  more  significant  part  of  the  accumulator  and  refining 
this  with  one  application  of  Newton's  method  for  reciprocation.   No 
other  facilities  for  augmented  precision  division  are  provided  in  the 
package. 


DEFINE  DNORM 
This  operation  normalizes  the  augmented  precision  accumulator  by 
normalizing  its  more  significant  half  and  adding  its  less  significant 
half  to  the  result.   This  process  is  repeated  once  in  case  the  more 
significant  part  was  originally  zero  and  the  less  significant  part  was 
unno  rmal  i  z  e  d . 

DEFINE  SINGLE 
This  operation  normalizes  the  augmented  precision  accumulator  and 
then  rounds  it  by  adding  a  quantity  of  the  correct  sign  and  exponent. 
The  single  precision  result  is  left  in  RGA. 
h. 3   Operations  involving  double  precision  numbers. 
DEFINE  DADD  &AH  &AL 
The  augmented  precision  number  (&AH,  &AL)  is  added  to  the  augmented 
precision  accumulator. 

DEFINE  DSUB  &AH  SAL 
The  augmented  precision  number  (&AH,  &AL)  is  subtracted  from  the 
augmented  precision  accumulator. 
DEFINE  DMULT  8AH  &AL 
The  augmented  precision  accumulator  is  multiplied  by  the  augmented 
precision  number  (&AH,  &AL).   The  method  used  is  equivalent  to  the  follow- 
ing algorithm,  except  that  only  the  more  significant  part  of  MUX  &AL  &YL 

is  generated  and  used. 

DEFINE  DMULT  &AH  &AL  = 

DSTORE  YH  YL; 

MUX  &AL  YL: 

DOT  &AL  YH: 

DOT  &AH  YL: 

DOT  &AH  YH  ##: 


5.   ACCURACY 

Since  the  extended  precision  hardware  on  ILLIAC  IV  differs  from  that 

pprovided  on  most  machines,  it  is  necessary  to  consider  how  this  affects 

the  accuracy  of  the  result  obtained. 

Analysis  will  be  confined  to  the  accumulation  of  the  inner  product 
n 
of  two  vectors,  Y  a.b..   This  is,  by  far,  the  most  common  use  of  double 

i=l  X  X 
precision  arithmetic. 

It  is  assumed  that  the  calculations  are  being  performed  serially; 

i 

i.e.,  partial  sums  are  being  calculated — S.  =   Y  a.b.  from:  S.  =  S.   +a.b.. 

i    L       J  j       1    i-I  1  1 

Normally,  6k   such  inner  products  would  be  accumulated  simultaneously. 
However,  if  summation  is  being  carried  out  across  the  PEs  by  a  method 
such  as  the  "log  sum"  [6]  technique,  the  bound  on  the  error  is  even 
smaller,  (See  Linz  [5]).   The  method  analyzed  is,  therefore,  "worst  case". 

Consider  first  the  accumulation  of  inner  products  on  a  standard 
computer  using  a  word  with  t  bits  mantissa  and  accumulating  products  in 
a  double  length  register  of  2t  bits.   The  final  sum  is  then  rounded  to 
single  precision. 

Let  fl(E)  represent  the  evaluation  in  floating  point  arithmetic  of 

th 
the  expression  E.   If  S.  is  the  i   partial  sum,  then 

S  =  0 
o 


S.  =  fl(s.  .  +  fl  (a.x  b. )) 

l       i-I        l   l 


i  =  1,  2,  ...,n. 


The  product  of  the  single  precision  operands  a.,  b.  is  first  formed, 

giving  a  double  length  product;  this  calculation  is  exact.   Adding  this 

product  to  the  partial  sum  S.   ,  using  double  length  arithmetic,  results 
in  a  rounding  error: 
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S.    =   fl(S.    .    +   fl(a.xb.)) 
1  l-l  i        l 


=   fl(S.    .    +   a.b. ) 
l-l  l   l 


=  S.    ,     (1+e. )   +  a.b.     (1+6.  ). 
l-l  l  li  l 


It   may  be   shown  that 


3  „-2t 


lej.       |6j        <f2 

[See  for  instance  Wilkinson  [U]]»  and  hence: 

n 

fl  (aHa)  =  fl  (  7   a.b.  )  =  S 
i.   l  i     n 
i=l 

■  lalbi    (1+«i»    (1+ei+l>    (l+Ei+2»    •••    (1+El+n) 

1  =  1 

■  laibi   (1+Vi' 

1=1 

where: 

(i-f  2-2vi+1  t  i  +  Ti  <  (1  +  |2-2tri+i . 

Since  the  factor  (l  ±  —  2    )      is  inconvenient,  we  make  the 
assumption  that: 

|  n  2"2t  <  0.1. 

-2t 
This  will  be  true  for  any  value  of  n  found  in  practice  since  2    is  very 

small.   The  ineaualities  for  y.    may  then  be  simplified  to: 

l 

|y. |  <  |  (1.06)  (n-i+l)  2_2t 

<  I  (1.06)   n  2_2t. 

T  T 

Denoting  by  fl     (a  b)  the  result  of  rounding  fl  (a  b)  to  single 

precision,  a  bound  on  the  absolute  error  may  be  found  from: 

T      T  i 
|fl2  1  (a  b)  -  a  b| 

<  |fl2  x  (aTb)  -  fl  (aTb)| 
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T  T 

+     fl(a  b)    -   a  b 


n 
<    |fl(aTb) |    2_t   +      I    |a.b.    y.  I 
i=l 

u  .  n 

)      a.b. (1+y. ) |    2    '    +      I       |a.b.    y. 
1    Ln      i  1         i    '  . L.      '    l  l     i 

i=l  i=l 


n  n 

|    I     a.b.  |  2_t    +    (l+2_t)      I      |a.|     |b.|     |Y. 

-'.**_      11  ...ill 

i=l  i=l 


<   2_t    |aTb|    +  |  .1.1  n   2"2t      J       |a. |     |b. | 

1=1 

Therefore,   using  the  Schwarz   inequality,   the   absolute  error   is  bounded 
by: 

-t      i     T     i  ^  -Pt      i   i      i  i  i  i      i  i 

2         \aS\    +|  .1.1  n  2  ||a||2      ||b||2    . 

Note  that  this    does   not   necessarily   give   a  small   relative   error;    however, 
unless   severe  cancellation  takes   place,   the   second  term  is   negligible 
compared  with  the   first. 

In  performing  the   analogous   calculations   using  ILLIAC   IV  hardware, 
the   following  steps   are  performed: 

5.1  At   the   i        step,    a.    and  b.    are  multiplied  giving  a  double  length 
product.      No  round  off  error  is   produced. 

(p.,p.)   =   fl    (a.    x  b. )   =   a.b. 

11  l  l  ii 

It  is  assumed  that  the  a.  and  b.  are  normalized.   This  is  essential  for 

l      l 

the  error  analysis.   [intuitively  we  want  to  push  as  much  of  the  product 
into  the  high  precision  part  as  possible.  ] 

5. 2  The  low  precision  parts  of  p.  and  S.    are  added  using  an  ordinary 

rounded  and  normalized  add.   Call  the  result  u. 

i 


12 


ut  =  fl  (pli +  si+i» 

5. 3  S.    is  added  to  p.,  using  the  EAD  instruction,  forming  S  with 

the  overflow  going  into  v..   Normalizing  S.  _  controls  the  error. 

l  l-l 

Intuitively  we  put   as  much  as  the   sum  in   the  high  order  word  as  possible, 

Where  %   denotes  the  EAD  operator,   no   round  off  error   is    involved;    in 

fact: 

J*-  oh  h 

S.    +  v.    =  S.    _    +  p. . 
i  i  i=l  l 


And  hence, 


h  v       h 

•    =      /     P.    ~     L     v- 


S.    is   formed  from  u:    and  v.    using  a  rounded  and  normalized  add:    i.e., 
l  l 

S1  =   fl    (u.    +  v. ) 
l  ii 

=  u.    (1  +   e.J    +  v.    (1  +   e.,) 
i  i3  i  i4 

where    le.J,    |  e  .g  |    ,     |  e .  g  | ,     |e.J      <  2~k\ 

Using  the  above   equations,   we  have 

S.    =   S.    ,    +  a.h.    +  E. 
l  l-l  ii  l 

where       E.    =   [pj    (l  +   ^J   +  S^    (l  +   e±2)]    (l  +   e^)   +  v.    (l  +   e^) 

"  [Pj  +  SLl  +  Vi]  ! 

=  Pi   [(1  +   Eil)    (1  +  c13)   "1]   +  Vi   HM  +  Si-1    [(1  +   ei2)(l  +   ei3}   "1] 

Bounds  must  "be  obtained  for    |p.  I,    |v.  I    and    Is.    _  I  . 

1    l '        '     1  i  i     1-1  i 

I    1|         0-46    1    hi         0-46    I      v    I 
p.      <  2  p.       :   2  a.b. 

1    1 '    ~  '    1 '    ~  '    1    1 ' 
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This  depends  on  the  fact  that  a.  and  b.  are  normalized.   The  mantissa 

11 

part  of  p.    is   at   least  r-  since  that   of  both  a.    and  b.    is   at    least   — . 
*  ri  k  11  2 

Because  the  mantissa  part   of  p.    is   not    greater  than  1,    and  the   exponents 

of  p.    and  p.    differing  by  exactly  kQ.      Since   it    is  the  low  precision 

part   of  an  extended  add  of  S.    .    and  p.,    a  bound  for     v.      is 

l-l  i*  '    i' 

I       I        0-l+6  ri0h      .       I    hi, 

v .      <    2  max    {    S .    _,,      p.)    . 

1    i '    -  '    i-I '       '    i ' 

This    follows   since  the   exponent   of  v.    is   at   least   U8  less  than  the  larger 

of  S.    .,    and  p.,    and  the  mantissa  part   of  p.    is   at   least   r-  and  that   of 
i-I  *i'  i  u 

S     ,    is   at   least   —  (since   it    is   normalized).      Thus, 
1-1  2 

|v.|    <2-^max{|Y     p*       Xf     v.|,   p*} 
j=l       J       j=l       J 

\  r  n  n 

<2'U6    <  1    |p*|  +   I  k|) 


and 


n                          i  £          n  n 

J  |v.  I    <   2       n    {    J    la.b.  I  +      V  Iv.l} 

u  '    i '    —                     "II1  "  '    1 ' 

i=l                                   j=l     J   J  j=l  J 


I      |t.  |    U   -   2"U6n}    <   2-^n      ||a|L 
1=1 

Hence,  for  any  reasonable  n, 

v   i   i  -1+6   i  i  i  i    i  i  i  i 

I      |t  |  <  1.1  x  2   n   |a| |    ||b| | 

i=l 

It   only  remains   to   find  a  bound  for  S.  . 

l 


S1  =  u.    (1  +   e,J   +  v.    (1  +   e.,  ) 

ii  i3  i  i4 

1 


=  P.    (1  +  eu)   (1  +  E.3)  +  v.    (1  ♦  c.h)  ♦  s^d  +  e.2)   (1  ♦  e13) 
"  j     v]   (1  ♦  e..)    (1  ♦  e.3)    (1  +  eJ+1>2)   x  (l  ♦  t.^J    ...    (l  ♦  ^ 


lit 


(1  +  Ei3>  +     l  VJ    (1  +  *ik>    (1  +  Vl.2}    (1  +  W 

(l   +   e      )    (l  +   e      ) 
v  i2;    v  ei3 

1       1  i 

I     P1    (1   +  YJ   +      I     v      (1   +   6    ) 


where,    as  before,    it   may  be  shown  that 


Y.|,     |5j     s  f   (1.06)    (2n)    2"4°  =   A. 


However,  there  may  be  up  to  2n  factors  in  each  terms  and  the  work  is  being; 
done  in  single  precision.   Therefore, 


s1  =  y  p^  +  y  v.  +  y  p!  Y.  +  y  v.  6. 

1   .1=1  J    .1=1  x   .1=1  J  J    .1=1  J  -1 


But    since, 

sh  =    I    Ph  -    J 

1  ;_n  J  ,_ 


3-1     J       3=1     J 


then. 


s.  =  sh  +  S^ 
111 


i       h  i       1  i       ,  i 

=     I     Pi   +      I     Pi    +      I     Pi    Y,    +      I   v     6 
j=l     J        j=l     J        j=l     J      J        j=l   J      J 


i        ,  i 


Putting  i  =  n 


j=l      J    J        j=l     J      J        j=l     J      J 


fl    (aTb)   =  S 


T. 
=  a 


1  r      1  r 

b   +     )    p.    Y.    +     )      v.    6. 

.L_,    l    '  l        .L.      li 


i=l  i=l 

The   absolute  error    is,   thus: 

n  n  n  n 

I    I      Pi   Yi   +      I      v     Y  |    <         y    |p  |    A  +      I    |v    |    A 
i=l                     i=l                         i=l  i=l 
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:  2~h6     I      |a/b.|A+  (1.1  x  2'k6n   ||a|L      ||b|L)A 
i=l 

^  1.1  x  2~   6   A   (n  +  1)    I lal  L      I  lb  I  L 


<;  1.1  x  1.06  x   3  x  2  x  2       n(n+l)    | |a| |         | |b 


<   2"92  n(n  +  1)     ||a||2      ||b||2. 

Rounding  the  result   to   single  precision,    as  before,   and  calling  the 
result    fl        (a  b),   the   following  bound  is   obtained: 

|fl2sl(aTb)    -   aTb|    *  2~kQ    \a\\   +  2"92  n(n  +  l)    ||a||2      ||b||2. 

Putting  a  mantissa  size  t   =   U8  into  the  error  bound  for  a  standard 
machine  gives,   by   comparison: 

2"M    |aTb|    +1.1    (fn)2-96    |  |.|  |g    ||b||2. 

The   essential   difference  between  these  two   results    is  that   the 
analysis   for  ILLIAC   IV  produces   a  factor  of  n(n  +  1 )    in  the   second  term 
rather  than  a   factor  of  n.      Even   for  large  n,   however,   this   term  should 
be  negligible  unless   severe   cancellation  takes  place.      ILLIAC   IV 
benefits    further   from  a  large  mantissa  size.      The  method  used  is 
therefore   fully   justified. 
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6.  'SPACE 'AND  TIMING 

Instruction  times  quoted  in  the  ILLIAC  IV  Systems  Characteristics 
and  Programming  Manual  [3]  have  been  used  in  estimating  the  execution 
times  of  the  appropriate  functions.   No  FINSTAPE  overlap  has  been 
assumed.   One  clock  memory  access  time  has  been  added  for  RGS  and  7 
clocks  for  memory.   Each  define  parameter  is  assumed  to  imply  a  memory 
access  except  where  actual  parameters  in  the  package  indicate  otherwise 
Space  and  execution  times  are  presented  in  the  following  table: 


FUNCTION 

DADD 

DCLEAR 

DLOAD 

DMINUS 

DML 

DMULT 

DNEG 

DNORM 

DOT 

DPLUS 

DRECIP 

DSTORE 

DSUB 

DTIMES 

MUX 

SINGLE 


SPACE 
(syllables  ) 

7 
3 

6 

8 
60 

6 

9 
IT 

6 
8U 

k 

7 
29 
11 
18 


TIME 
(PE  clocks) 

72 

17 

32 

52 

26 
379 

36 

63 
102 

52 
5^7 

32 

72 
l6l 

50 

98 


Table  6.1   Space  and  Execution  Time 


Users  wishing  to  convert  any  of  these  defines  into  subroutines 
should  consult  Appendix  A. 


17 


7.   CONCLUSION 

These  augmented  precision  routines  are  designed  to  help  the  user 
retain  numerical  significance  during  certain  critical  parts  of  a 
calculation  rather  than  to  provide  double  precision  routines  per  se. 
The  modest  sacrifice  of  accuracy, an  economical  number  representation, 
and  a  certain  symmetry  in  otherwise  associative  operations  are,  the 
authors  felt,  more  than  amply  repaid  by  increased  execution  speed. 

It  remains  to  be  seen  whether  the  general  user  agrees  with  this 
thesis. 
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APPENDIX  A.   Defines  as  Subroutines 

Two  defines,  DMULT  (30  words)  and  DRECTP  (1+2  words),  occupy  enough. 
space  and  execute  long  enough  to  "be  made  subroutines  (if  they  are  to  be 
invoked  more  than  once)  without  appreciably  degrading  their  performance. 
For  the  purposes  of  this  conversion,  the  augmented  precision  functions 
fall  into  two  classes. 
A.l  Defines  Without  Parameters 

Because  the  augmented  precision  accumulator  is  the  only  operand,  the 
conversion  is  easy  and  the  subroutine  becomes: 

RECIPS: :RECIP; 

EXCHL(3)    $ICR; 
and  may  be   called  by   invoking  the   standard  CALL  define: 

DEFINE  CALL   &NAME  = 

CLC(3) 

SLIT (2)    =    &NAME; 

EXCHL(3)    $ICR  ##; 
thus : 

CALL  RECIPS; 
This    complies  with  subroutine   standards. 
A. 2     Defines  With  Parameters 

There  are  two   useful  methods. 
A. 2.1     The  user  may   declare   row  variables   for  use  when  passing  parameters 
to  the   subroutine.      If  XH  and  XL  are  user  declared  variables,   then  the 
subroutine  may  look  like  the   following: 

MULTS::MULT  XH  XL; 

EXCHL(3)    $ICR; 
(where  XH,   XL  have  been   declared  XH:BLK  1;   XL:BLK  1;)    and  the   calling 
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sequence  would,  become 

LDA  AEGH; 

STA  XH; 

LAD  AEGL; 

STA  XL; 

CALL  MULTS: 
That  is,  the  subroutine  has  been  made  parameterless. 

A. 2.1.   A  slightly  faster  method  is  to  use  ACARO  and  ACAR1  to  "point" 
to  the  correct  arguments.   The  calling  sequence  would  then  be: 

CLC(O); 

SLIT(O)  =  ARGH; 

CLC(l); 

SLIT(l)  =  ARGL; 

CALL  MULTSS: 
where  the   subroutine    is   now: 

MULTSS::MULT  0(0)    0(l); 

EXCHL(3)    $ICR   ##; 
A  little  more   elegance  may  then  be   achieved  with: 

DEFINE  EXECUTE  &NAME  &AH  &AL  = 

CLC(O); 

SLIT(O)    =    &AH; 

CLC(l); 

SLIT(l)    =   &AL; 

CALL  &NAME  ## ; 
Both  methods  comply  with  standard  subroutine  conventions. 
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APPENDIX  B.   Define  Dependencies 

Some  defines  invoke  others  which,  in  turn,  invoke  defines.   The  list 
below  illustrates  these  dependencies.   Invoked  defines  are  followed  by 

a  list  in  parentheses  of  the  defines  which  they  Invoke. 
DMULT  DSTORE,  DOT  (DML,  DADD) 

DOT  DML,  DADD 

DRECIP  DTIMES  (DML),  DMINUS,  DNEG 

DTIMES  DML 

MUX  DNORM 

SINGLE  DNORM 

Two  defines  use  memory  locations  which  must  be  declared  by  the 
user.   The  memory  locations  are: 

DTEMPH:   BLK  1; 

DTEMPL:   BLK  1; 
The  defines  using  those  memory  locations  are: 

DMULT  DTEMPH,  DTEMPL 

DRECIP  DTEMPH 

If  the  user  has  rows  X  and  Y  say,  which  are  available  for  use  by  DMULT 
or  DRECIP,  the  define: 

DEFINE  DTEMPH  =  X  ##,  DTEMPL  =  Y  ##; 
declared  before  DMULT  or  DRECIP  is  invoked  will  cause  them  to  use  X 
and  Y  in  place  of  DTEMPH  and  DTEMPL. 

P.E.  registers  RGR  and  RGS  are  used  by  the  following  defines: 

DMULT  RGR,  RGS 

DNORM  RGR 

DOT  RGR,  RGS 
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DRECIP 
DTIMES 
MUX 
DNORM 


RGR,  RGS 


RGR,  RGS 
RGR,  RGS 
RGR 


All  defines  use  RGA  and  RGB.   None  use  RGX. 
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APPENDIX   C. 

The   annotated  bodies   of  the   augmented  precision   defines   appear  below, 
Their  use   supposes   that    exponent  order flow     does   not    cause  the   F-bit   to 
be  s et . 


2k 


LI  STC  J    i  I'm     J  0/01/ 7  I        fiTl     1  3  :  n 

DEFjmE    !  i"L    R«    = 

ML    *HJ 

ASH: 

LnR    =,  A  ; 

Lns    hi  i 

Li)A    =3Fno:  ]  f  j 

S  -I  A  i      4  {*  J 

A  n  M    f, r, ; 

m  n  L  /    S  R    » * ; 


DEFT  ME.    UTIMF*;    *  A     = 


LOA 

Lns 

STH 
STA 
Lf)A 
O-iL 

Lns 

OADn 


iCC 


f  a  ; 

AliX 
Arc  : 
ACC  S 
A  J  X  A  C  C  J 
*  a  ; 
<s; 
^  a  ; 

SH     IS 


s  * 


DEFp'E    OPLUS 
LnA    ACC : 

ti  fl  R  k>  • 

Ea')  "A; 

STH  ACC! 

AnR  AUXACCJ 

STA  AUXACf     tfi> 


A  = 


DEFINE 
LDA  ACC 
N  0  R  ~<  s 
ESH 
ST  3 
AnR 
STA 


f)?1II-i!lS  4  A 


'.  a; 
ACC  : 
AUX acc ; 

AOXACC 


DEFINE     ONilR  1 
LnA     ACC! 
n  n  R  ' : 
EaD 

L^K 
LOA 
MOR 
LAD 
STH 
STA 


A  U  X  A  C  c  : 
tA; 
t.-j ; 

<R; 
ACC; 

A  m  x  AC  <" 


OEFTNE     DCLEAn     = 

CL^i : 

STA    Arc: 

STA     A.JXAC  C        -  '■ 


UEFT'-iE     JL- 
L  D  A     \t\v-  ; 
STA    A  C c ; 

l^a   * al ; 

STA     A  J  X  A  C C 


A  0      y  M  H 


A! 
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uEFtNE  )STJ?r 
LnA  acc: 

STA  «.  A  ri  ! 

l_  n  a  a  .  j  y  4  c  r  : 
STA  KM    »»; 


4  An  4A[_  = 


pFFtnE   0  A  f}  0  -AH 
L  T  A  A  c  c ; 

n  n  «  » s 


EAU 
ST8 

STA 


»  Ah  : 
ACC  i 
*.  it  J 
A  I X  i  C  C 

A  U  X  A  C  C 


Au  = 


«  1 


OEFtNE   0  5 'J 
LOA  ACC: 
•N  0  R  4  J 

lsh  <  ah; 

ST«  ACC  : 

S  8  q  <  A  L  i 
AQK  AUXACC: 
STA  A'KACC 


*  A  H  <  A  L 


OEF"lNE  10 «: 
L  0  A  <  4  ! 
iHL,  <H» 

STH  ACC  : 
STA  A'JXACC 


lA     *•* 


i  * 


L  lA     i  M 

■)ml   *  <; 

LIS    *,  4  ; 

0  4  Or)     SH     5  5     *i 


DEFINE     OM'JLT     '.AH 
I)  S  T  t)  *  E     )  T  E  -i 3  h    u  T  :• 
L  r\  A     •:  A  L  ; 
.■IL^M    r)TEi;JL  s 

STA    ACC: 
U)8     aOJ 

S  T  8    4 1 1 X  A  C  "  J 

DOT  \  AH  DTE  1:>L  '> 
DOT  o.  A  L  iHFJvii 
DOT     VAH     )Tt  1'H     « 


*  AL 
■  PL 


i)EFpiE    ..)ME>'i    : 
L1A    acc; 

C  M  S  A  ! 

sta   a<:c ; 

L  )  A     A  '■  J  X  A  c  C  : 

CHS\! 

STA     A 1 1  X  A  c  C     7 


0  F  F  T  J  F      )  -i r.  C  T :' 

L  o  a  a  •:  c ; 

L'ii<  «  A  j 

L  i  A  =  1  o  h  :  i  > : 
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S  T  A  1 T  F  -I P  •*  > 
DTl^ES  OTff^MJ 
OTITIS  UTE"tP-l! 
0  M I  M 1 1 S  ')  T  Z  *  ~>  H  i 
0  M I N 1 1 S  0  T  E. «  *  u  J 
3MEr,    **! 


OEFiNT  SINGLF 

ONUR  M 

Lr)A  =7F41  :  16: 

ShAl  '^: 
i.dh  aa; 
L  0  A  A  c  c ; 

AsBt 
S  W  A  P  J 

AHE.X  *3J 
EAO  ACC! 
S* AP  **J 
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APPENDIX  D.   The  Use  of  Conditional  Definition. 

The  reader  will  notice  that  the  defines  DADD  and  DPLUS  differs  "by 
one  parameter  and  one  ASK  instruction.   By  usinp:  the  conditional 
assembly  features  of  ASK  [2],  DPLUS  and  DADD  may  he  combined,  as  can 
DMINUS  and  DSUB: 

DEFINE  DADD  &AH  &AL  = 

LDA  ACC; 

NORM; 

EAD  &AH; 

STB  ACC; 

&IF  &EMPTY(&AL)  &THEN  %   IF  SECOND  PARAMETER  GIVEN 

&ELSE  ADR  &AL;  &FI&  #THEN  USE  IT 

ADR  AUXACC; 

STA  AUXACC  ##; 

DEFINE  DSUB  &AH  &AL  = 

LDA  ACC; 

NORM; 

ESB  &AH; 

STB  ACC; 

&IF  &EMPTY(&AL)  &THEN  $IF  SECOND  PARAMETER  GIVEN 

&ELSE  SBR  &AL;  &FI ;    #THEN  USE  IT 

ADR  AUXACC; 

STA  AUXACC  ##; 
The  invocation: 

DADD  A; 
is  now  identical  with  the  invocation 
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DPLUS  A; 
while  the  call 

DADD  A,  B; 
retains  its  original  meaning. 

The  provision  of  defines  with  conditional  bodies  thus  makes  the 
macro  package  easier  to  use  in  the  sense  that  fewer  define  names  need 
to  be  learned  and  understood. 

It  is  worth  mentioning,  however,  that  combining  DTIMES  and  DMULT 
is  not  as  neat  as  the  above  examples: 

DEFINE  DMULT  &AH  &AL  = 

JSIF  &EMPTY(&AL)  &THEN  J&IF  SECOND  PARAMETER  ABSENT 

#THEN  ISSUE  CODE  FOR  DTIMES 

LAD  &AH; 

DML  AUXACC; 

LDS  ACC; 

STR  ACC; 

STA  AUXACC; 

LDA  &AH; 

DML  $S; 

LDS  $A; 

DADD  $R  $S 

&ELSE  ^OTHERWISE 

DSTORE  DTEMPH  DTEMPL;  TISSUE  CODE  FOR  DMULT 

LDA  &AL; 

MLRN  DTEMPL; 

STA  ACC; 

LDB  =  0; 
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STB  AUXACC; 

DOT  &AH  DTEMPL; 

DOT  &AH  DTEMPH; 

DOT  &AH  DTEMPH   &FI; 
However,  the  notational  economy  makes  the  use  of  conditional  compilation 
worthwhile. 

Conditional  compilation  may  also  be  used  to  increase  the  scope  of 
the  define.   For  instance,  when  accumulating  sums  of  positive  numbers, 
one  knows  the  more  significant  word  of  the  augmented  precision  accumula- 
tor is  non-zero,  and  thus  some  of  the  instructions  in  DNORM  are 
unnecessary.   A  version  of  DNORM  that  may  he  used  to  normalize  only  as 
far  as  the  more  significant  half  of  the  augmented  precision  accumulator 
or  to  completely  normalize  the  accumulator  might  be  written  as  follows: 

DEFINE  DNORM  &N  = 

LDA  ACC; 

NORM; 

EAD  AUXACC; 

&IF  &N  &THEN  #IF  &N  IS  ODD,  NORMALIZE  TO  ACC  ONLY  &ELSE 

LDR  $A;       ^OTHERWISE  NORMALIZE  WHOLE  ACCUMULATOR 

LDA  $B; 

NORM; 

EAD  $R;  &FI; 

STB  ACC; 

STA  AUXACC   ##; 
Thus,   the   invocation 


DNORM  1; 
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normalizes  only  as  far  as  the  more  significant  half  of  the  augmented 
precision  accumulator,  while 

DNORM  2; 
normalizes  the  whole  augmented  precision  accumulator. 

Use  of  conditional  compilation  facilities  may  thus  enhance  the 
efficiency  of  the  compiled  program  without  the  notational  disadvantage 
of  having  to  provide  a  myriad  of  specific  subroutines  or  macros  for 
every  function  variant. 

The  facilities  presented  here  are  mainly  illustrative  and  are  not 
present  on  the  standard  library  tape.   The  user,  considering  the 
augmented  precision  macros  in  the  light  of  his  own  particular  applica- 
tion, will  no  doubt  devise  suitable  local  enhancements. 
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